Adaptive-sampling algorithms for answering aggregation queries on Web sites
نویسندگان
چکیده
Many Web sites publish their data in a hierarchical structure. For instance, Amazon.com organizes its pages on books as a hierarchy, in which each leaf node corresponds to a collection of pages of books in the same class (e.g., books on Data Mining). Users can easily browse this class by following a path from the root to the corresponding leaf node, such as ‘‘Computers & Internet – Databases – Storage – Data Mining’’. Business applications often require to submit aggregation queries on such data, such as ‘‘finding the average price of books on Data Mining’’. On the other hand, it is computationally expensive to compute the exact answer to such a query due to the large amount of data, its dynamicity, and limited Web-access resources. In this paper, we study how to answer such aggregation queries approximately with quality guarantees using sampling. We study how to use adaptive-sampling techniques that allocate the resources adaptively based on partial samples retrieved from different nodes in the hierarchy. Based on statistical methods, we study how to estimate the quality of the answer using the sample. Our experimental study using real and synthetic data sets validates the proposed techniques. 2007 Elsevier B.V. All rights reserved.
منابع مشابه
Answering Aggregation Queries on Hierarchical Web Sites Using
Many Web sites publish their data in a hierarchical structure. For instance, Amazon.com organizes its pages about books as a hierarchy, in which each leaf node corresponds to a collection of pages of books in the same class (e.g., books on Data Mining). This class can be browsed easily by users following a path from the root to the corresponding leaf node, such as “Computers & Internet — Databa...
متن کاملEfficient Ad-hoc Approximate Query Processing in Peer-to-Peer Databases
1 This paper has appeared in The 22 International Conference on Data Engineering (ICDE) Atlanta, Georgia 2006. ABSTRACT Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries – e.g., aggregation queries – on these databases poses unique challenge...
متن کاملOvercoming Limitations of Sampling for Aggregation Queries
We study the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To address this issue, we introduce a technique called outlier-indexing. Uniform sampling is also ineffective for queries with low selectivity. We rely on weighted sampling based on workload information ...
متن کاملComplexity of Answering Counting Aggregate Queries over DL-Lite
The ontology based data access model assumes that users access data by means of an ontology, which is often described in terms of description logics. As a consequence, languages for managing ontologies now need algorithms not only to decide standard reasoning problems, but also to answer databaselike queries. However, fundamental database aggregate queries, such as the ones using functions COUN...
متن کاملمدل جدیدی برای جستجوی عبارت بر اساس کمینه جابهجایی وزندار
Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Data Knowl. Eng.
دوره 64 شماره
صفحات -
تاریخ انتشار 2008